Overview

Brought to you by YData

Dataset statistics

Number of variables20
Number of observations36,457
Missing cells11,323
Missing cells (%)1.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.3 MiB
Average record size in memory612.3 B

Variable types

Numeric7
Categorical11
Boolean2

Alerts

FLAG_MOBIL has constant value "1" Constant
CNT_CHILDREN is highly overall correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly overall correlated with CNT_CHILDRENHigh correlation
CODE_GENDER is highly overall correlated with OCCUPATION_TYPEHigh correlation
DAYS_EMPLOYED is highly overall correlated with NAME_INCOME_TYPE and 1 other fieldsHigh correlation
NAME_INCOME_TYPE is highly overall correlated with DAYS_EMPLOYEDHigh correlation
OCCUPATION_TYPE is highly overall correlated with CODE_GENDER and 1 other fieldsHigh correlation
NAME_EDUCATION_TYPE is highly imbalanced (50.6%) Imbalance
NAME_HOUSING_TYPE is highly imbalanced (73.1%) Imbalance
FLAG_EMAIL is highly imbalanced (56.4%) Imbalance
is_high_risk is highly imbalanced (91.6%) Imbalance
OCCUPATION_TYPE has 11323 (31.1%) missing values Missing
ID has unique values Unique
CNT_CHILDREN has 25201 (69.1%) zeros Zeros

Reproduction

Analysis started2025-05-10 10:34:41.840313
Analysis finished2025-05-10 10:34:50.493082
Duration8.65 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

ID
Real number (ℝ)

Unique 

Distinct36457
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5078227
Minimum5008804
Maximum5150487
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2025-05-10T11:34:50.584438image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum5008804
5-th percentile5018456.6
Q15042028
median5074614
Q35115396
95-th percentile5146024.2
Maximum5150487
Range141683
Interquartile range (IQR)73368

Descriptive statistics

Standard deviation41875.241
Coefficient of variation (CV)0.0082460356
Kurtosis-1.2126137
Mean5078227
Median Absolute Deviation (MAD)38093
Skewness0.08624229
Sum1.8513692 × 1011
Variance1.7535358 × 109
MonotonicityNot monotonic
2025-05-10T11:34:50.715152image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5008804 1
 
< 0.1%
5096993 1
 
< 0.1%
5096983 1
 
< 0.1%
5096987 1
 
< 0.1%
5096988 1
 
< 0.1%
5096990 1
 
< 0.1%
5096991 1
 
< 0.1%
5096992 1
 
< 0.1%
5096994 1
 
< 0.1%
5096978 1
 
< 0.1%
Other values (36447) 36447
> 99.9%
ValueCountFrequency (%)
5008804 1
< 0.1%
5008805 1
< 0.1%
5008806 1
< 0.1%
5008808 1
< 0.1%
5008809 1
< 0.1%
5008810 1
< 0.1%
5008811 1
< 0.1%
5008812 1
< 0.1%
5008813 1
< 0.1%
5008814 1
< 0.1%
ValueCountFrequency (%)
5150487 1
< 0.1%
5150485 1
< 0.1%
5150484 1
< 0.1%
5150483 1
< 0.1%
5150482 1
< 0.1%
5150481 1
< 0.1%
5150480 1
< 0.1%
5150479 1
< 0.1%
5150478 1
< 0.1%
5150477 1
< 0.1%

CODE_GENDER
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
F
24430 
M
12027 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
F 24430
67.0%
M 12027
33.0%

Length

2025-05-10T11:34:50.832100image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:50.897235image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
f 24430
67.0%
m 12027
33.0%

Most occurring characters

ValueCountFrequency (%)
F 24430
67.0%
M 12027
33.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 36457
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 24430
67.0%
M 12027
33.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36457
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 24430
67.0%
M 12027
33.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 24430
67.0%
M 12027
33.0%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size35.7 KiB
False
22614 
True
13843 
ValueCountFrequency (%)
False 22614
62.0%
True 13843
38.0%
2025-05-10T11:34:50.943725image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size35.7 KiB
True
24506 
False
11951 
ValueCountFrequency (%)
True 24506
67.2%
False 11951
32.8%
2025-05-10T11:34:50.991351image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

CNT_CHILDREN
Real number (ℝ)

High correlation  Zeros 

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.43031517
Minimum0
Maximum19
Zeros25201
Zeros (%)69.1%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2025-05-10T11:34:51.045831image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum19
Range19
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7423669
Coefficient of variation (CV)1.7251702
Kurtosis22.562434
Mean0.43031517
Median Absolute Deviation (MAD)0
Skewness2.5693822
Sum15688
Variance0.55110862
MonotonicityNot monotonic
2025-05-10T11:34:51.132801image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0 25201
69.1%
1 7492
 
20.6%
2 3256
 
8.9%
3 419
 
1.1%
4 63
 
0.2%
5 20
 
0.1%
14 3
 
< 0.1%
7 2
 
< 0.1%
19 1
 
< 0.1%
ValueCountFrequency (%)
0 25201
69.1%
1 7492
 
20.6%
2 3256
 
8.9%
3 419
 
1.1%
4 63
 
0.2%
5 20
 
0.1%
7 2
 
< 0.1%
14 3
 
< 0.1%
19 1
 
< 0.1%
ValueCountFrequency (%)
19 1
 
< 0.1%
14 3
 
< 0.1%
7 2
 
< 0.1%
5 20
 
0.1%
4 63
 
0.2%
3 419
 
1.1%
2 3256
 
8.9%
1 7492
 
20.6%
0 25201
69.1%

AMT_INCOME_TOTAL
Real number (ℝ)

Distinct265
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186685.74
Minimum27000
Maximum1575000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2025-05-10T11:34:51.253161image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum27000
5-th percentile76500
Q1121500
median157500
Q3225000
95-th percentile360000
Maximum1575000
Range1548000
Interquartile range (IQR)103500

Descriptive statistics

Standard deviation101789.23
Coefficient of variation (CV)0.54524373
Kurtosis17.598084
Mean186685.74
Median Absolute Deviation (MAD)45000
Skewness2.7390099
Sum6.8060019 × 109
Variance1.0361047 × 1010
MonotonicityNot monotonic
2025-05-10T11:34:51.387352image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
135000 4309
 
11.8%
180000 3097
 
8.5%
157500 3089
 
8.5%
112500 2956
 
8.1%
225000 2926
 
8.0%
202500 2192
 
6.0%
90000 1769
 
4.9%
270000 1675
 
4.6%
315000 1001
 
2.7%
67500 873
 
2.4%
Other values (255) 12570
34.5%
ValueCountFrequency (%)
27000 3
 
< 0.1%
29250 7
< 0.1%
30150 3
 
< 0.1%
31500 16
< 0.1%
31531.5 3
 
< 0.1%
31950 1
 
< 0.1%
32400 5
 
< 0.1%
33300 10
< 0.1%
33750 1
 
< 0.1%
36000 5
 
< 0.1%
ValueCountFrequency (%)
1575000 8
 
< 0.1%
1350000 6
 
< 0.1%
1125000 3
 
< 0.1%
990000 4
 
< 0.1%
945000 4
 
< 0.1%
900000 39
0.1%
810000 15
 
< 0.1%
787500 5
 
< 0.1%
765000 9
 
< 0.1%
742500 5
 
< 0.1%

NAME_INCOME_TYPE
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
Working
18819 
Commercial associate
8490 
Pensioner
6152 
State servant
2985 
Student
 
11

Length

Max length20
Median length7
Mean length10.856159
Min length7

Characters and Unicode

Total characters395,783
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWorking
2nd rowWorking
3rd rowWorking
4th rowCommercial associate
5th rowCommercial associate

Common Values

ValueCountFrequency (%)
Working 18819
51.6%
Commercial associate 8490
23.3%
Pensioner 6152
 
16.9%
State servant 2985
 
8.2%
Student 11
 
< 0.1%

Length

2025-05-10T11:34:51.508121image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:51.583907image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
working 18819
39.3%
commercial 8490
17.7%
associate 8490
17.7%
pensioner 6152
 
12.8%
state 2985
 
6.2%
servant 2985
 
6.2%
student 11
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
i 41951
10.6%
o 41951
10.6%
r 36446
 
9.2%
e 35265
 
8.9%
n 34119
 
8.6%
a 31440
 
7.9%
s 26117
 
6.6%
W 18819
 
4.8%
k 18819
 
4.8%
g 18819
 
4.8%
Other values (11) 92037
23.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 347851
87.9%
Uppercase Letter 36457
 
9.2%
Space Separator 11475
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 41951
12.1%
o 41951
12.1%
r 36446
10.5%
e 35265
10.1%
n 34119
9.8%
a 31440
9.0%
s 26117
7.5%
k 18819
5.4%
g 18819
5.4%
t 17467
 
5.0%
Other values (6) 45457
13.1%
Uppercase Letter
ValueCountFrequency (%)
W 18819
51.6%
C 8490
23.3%
P 6152
 
16.9%
S 2996
 
8.2%
Space Separator
ValueCountFrequency (%)
11475
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 384308
97.1%
Common 11475
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 41951
10.9%
o 41951
10.9%
r 36446
9.5%
e 35265
9.2%
n 34119
8.9%
a 31440
 
8.2%
s 26117
 
6.8%
W 18819
 
4.9%
k 18819
 
4.9%
g 18819
 
4.9%
Other values (10) 80562
21.0%
Common
ValueCountFrequency (%)
11475
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 395783
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 41951
10.6%
o 41951
10.6%
r 36446
 
9.2%
e 35265
 
8.9%
n 34119
 
8.6%
a 31440
 
7.9%
s 26117
 
6.6%
W 18819
 
4.8%
k 18819
 
4.8%
g 18819
 
4.8%
Other values (11) 92037
23.3%

NAME_EDUCATION_TYPE
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
Secondary / secondary special
24777 
Higher education
9864 
Incomplete higher
 
1410
Lower secondary
 
374
Academic degree
 
32

Length

Max length29
Median length29
Mean length24.862633
Min length15

Characters and Unicode

Total characters906,417
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigher education
2nd rowHigher education
3rd rowSecondary / secondary special
4th rowSecondary / secondary special
5th rowSecondary / secondary special

Common Values

ValueCountFrequency (%)
Secondary / secondary special 24777
68.0%
Higher education 9864
 
27.1%
Incomplete higher 1410
 
3.9%
Lower secondary 374
 
1.0%
Academic degree 32
 
0.1%

Length

2025-05-10T11:34:51.695702image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:51.775672image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
secondary 49928
40.8%
24777
20.2%
special 24777
20.2%
higher 11274
 
9.2%
education 9864
 
8.1%
incomplete 1410
 
1.2%
lower 374
 
0.3%
academic 32
 
< 0.1%
degree 32
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 99165
10.9%
c 86043
9.5%
86011
9.5%
a 84601
9.3%
r 61608
 
6.8%
o 61576
 
6.8%
n 61202
 
6.8%
d 59856
 
6.6%
y 49928
 
5.5%
s 49928
 
5.5%
Other values (15) 206499
22.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 759172
83.8%
Space Separator 86011
 
9.5%
Uppercase Letter 36457
 
4.0%
Other Punctuation 24777
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 99165
13.1%
c 86043
11.3%
a 84601
11.1%
r 61608
8.1%
o 61576
8.1%
n 61202
8.1%
d 59856
7.9%
y 49928
6.6%
s 49928
6.6%
i 45947
6.1%
Other values (8) 99318
13.1%
Uppercase Letter
ValueCountFrequency (%)
S 24777
68.0%
H 9864
 
27.1%
I 1410
 
3.9%
L 374
 
1.0%
A 32
 
0.1%
Space Separator
ValueCountFrequency (%)
86011
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 24777
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 795629
87.8%
Common 110788
 
12.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 99165
12.5%
c 86043
10.8%
a 84601
10.6%
r 61608
7.7%
o 61576
7.7%
n 61202
7.7%
d 59856
7.5%
y 49928
 
6.3%
s 49928
 
6.3%
i 45947
 
5.8%
Other values (13) 135775
17.1%
Common
ValueCountFrequency (%)
86011
77.6%
/ 24777
 
22.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 906417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 99165
10.9%
c 86043
9.5%
86011
9.5%
a 84601
9.3%
r 61608
 
6.8%
o 61576
 
6.8%
n 61202
 
6.8%
d 59856
 
6.6%
y 49928
 
5.5%
s 49928
 
5.5%
Other values (15) 206499
22.8%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
Married
25048 
Single / not married
4829 
Civil marriage
2945 
Separated
 
2103
Widow
 
1532

Length

Max length20
Median length7
Mean length9.3187317
Min length5

Characters and Unicode

Total characters339,733
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCivil marriage
2nd rowCivil marriage
3rd rowMarried
4th rowSingle / not married
5th rowSingle / not married

Common Values

ValueCountFrequency (%)
Married 25048
68.7%
Single / not married 4829
 
13.2%
Civil marriage 2945
 
8.1%
Separated 2103
 
5.8%
Widow 1532
 
4.2%

Length

2025-05-10T11:34:51.885488image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:51.967865image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
married 29877
55.4%
single 4829
 
9.0%
4829
 
9.0%
not 4829
 
9.0%
civil 2945
 
5.5%
marriage 2945
 
5.5%
separated 2103
 
3.9%
widow 1532
 
2.8%

Most occurring characters

ValueCountFrequency (%)
r 67747
19.9%
i 45073
13.3%
e 41857
12.3%
a 39973
11.8%
d 33512
9.9%
M 25048
 
7.4%
17432
 
5.1%
n 9658
 
2.8%
g 7774
 
2.3%
l 7774
 
2.3%
Other values (10) 43885
12.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 281015
82.7%
Uppercase Letter 36457
 
10.7%
Space Separator 17432
 
5.1%
Other Punctuation 4829
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 67747
24.1%
i 45073
16.0%
e 41857
14.9%
a 39973
14.2%
d 33512
11.9%
n 9658
 
3.4%
g 7774
 
2.8%
l 7774
 
2.8%
m 7774
 
2.8%
t 6932
 
2.5%
Other values (4) 12941
 
4.6%
Uppercase Letter
ValueCountFrequency (%)
M 25048
68.7%
S 6932
 
19.0%
C 2945
 
8.1%
W 1532
 
4.2%
Space Separator
ValueCountFrequency (%)
17432
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 4829
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 317472
93.4%
Common 22261
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 67747
21.3%
i 45073
14.2%
e 41857
13.2%
a 39973
12.6%
d 33512
10.6%
M 25048
 
7.9%
n 9658
 
3.0%
g 7774
 
2.4%
l 7774
 
2.4%
m 7774
 
2.4%
Other values (8) 31282
9.9%
Common
ValueCountFrequency (%)
17432
78.3%
/ 4829
 
21.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 339733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 67747
19.9%
i 45073
13.3%
e 41857
12.3%
a 39973
11.8%
d 33512
9.9%
M 25048
 
7.4%
17432
 
5.1%
n 9658
 
2.8%
g 7774
 
2.3%
l 7774
 
2.3%
Other values (10) 43885
12.9%

NAME_HOUSING_TYPE
Categorical

Imbalance 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
House / apartment
32548 
With parents
 
1776
Municipal apartment
 
1128
Rented apartment
 
575
Office apartment
 
262

Length

Max length19
Median length17
Mean length16.786132
Min length12

Characters and Unicode

Total characters611,972
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRented apartment
2nd rowRented apartment
3rd rowHouse / apartment
4th rowHouse / apartment
5th rowHouse / apartment

Common Values

ValueCountFrequency (%)
House / apartment 32548
89.3%
With parents 1776
 
4.9%
Municipal apartment 1128
 
3.1%
Rented apartment 575
 
1.6%
Office apartment 262
 
0.7%
Co-op apartment 168
 
0.5%

Length

2025-05-10T11:34:52.087649image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:52.168841image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
apartment 34681
32.9%
house 32548
30.9%
32548
30.9%
with 1776
 
1.7%
parents 1776
 
1.7%
municipal 1128
 
1.1%
rented 575
 
0.5%
office 262
 
0.2%
co-op 168
 
0.2%

Most occurring characters

ValueCountFrequency (%)
t 73489
12.0%
a 72266
11.8%
e 70417
11.5%
69005
11.3%
n 38160
 
6.2%
p 37753
 
6.2%
r 36457
 
6.0%
m 34681
 
5.7%
s 34324
 
5.6%
u 33676
 
5.5%
Other values (15) 111744
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 473794
77.4%
Space Separator 69005
 
11.3%
Uppercase Letter 36457
 
6.0%
Other Punctuation 32548
 
5.3%
Dash Punctuation 168
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 73489
15.5%
a 72266
15.3%
e 70417
14.9%
n 38160
8.1%
p 37753
8.0%
r 36457
7.7%
m 34681
7.3%
s 34324
7.2%
u 33676
7.1%
o 32884
6.9%
Other values (6) 9687
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
H 32548
89.3%
W 1776
 
4.9%
M 1128
 
3.1%
R 575
 
1.6%
O 262
 
0.7%
C 168
 
0.5%
Space Separator
ValueCountFrequency (%)
69005
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 32548
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 510251
83.4%
Common 101721
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 73489
14.4%
a 72266
14.2%
e 70417
13.8%
n 38160
7.5%
p 37753
7.4%
r 36457
7.1%
m 34681
6.8%
s 34324
6.7%
u 33676
6.6%
o 32884
6.4%
Other values (12) 46144
9.0%
Common
ValueCountFrequency (%)
69005
67.8%
/ 32548
32.0%
- 168
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 611972
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 73489
12.0%
a 72266
11.8%
e 70417
11.5%
69005
11.3%
n 38160
 
6.2%
p 37753
 
6.2%
r 36457
 
6.0%
m 34681
 
5.7%
s 34324
 
5.6%
u 33676
 
5.5%
Other values (15) 111744
18.3%

DAYS_BIRTH
Real number (ℝ)

Distinct7183
Distinct (%)19.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-15975.173
Minimum-25152
Maximum-7489
Zeros0
Zeros (%)0.0%
Negative36457
Negative (%)100.0%
Memory size284.9 KiB
2025-05-10T11:34:52.288408image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum-25152
5-th percentile-23019
Q1-19438
median-15563
Q3-12462
95-th percentile-9874
Maximum-7489
Range17663
Interquartile range (IQR)6976

Descriptive statistics

Standard deviation4200.5499
Coefficient of variation (CV)-0.26294237
Kurtosis-1.0456436
Mean-15975.173
Median Absolute Deviation (MAD)3425
Skewness-0.18422965
Sum-5.824069 × 108
Variance17644620
MonotonicityNot monotonic
2025-05-10T11:34:52.422790image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-12676 54
 
0.1%
-15519 54
 
0.1%
-16896 38
 
0.1%
-14667 37
 
0.1%
-15140 32
 
0.1%
-16768 32
 
0.1%
-15675 32
 
0.1%
-14136 30
 
0.1%
-13788 30
 
0.1%
-10182 29
 
0.1%
Other values (7173) 36089
99.0%
ValueCountFrequency (%)
-25152 2
< 0.1%
-25140 3
< 0.1%
-25099 1
 
< 0.1%
-25088 1
 
< 0.1%
-25010 2
< 0.1%
-24970 2
< 0.1%
-24963 1
 
< 0.1%
-24946 3
< 0.1%
-24932 4
< 0.1%
-24914 3
< 0.1%
ValueCountFrequency (%)
-7489 1
 
< 0.1%
-7705 1
 
< 0.1%
-7723 2
< 0.1%
-7757 4
< 0.1%
-7959 2
< 0.1%
-7980 1
 
< 0.1%
-8041 4
< 0.1%
-8054 1
 
< 0.1%
-8056 2
< 0.1%
-8067 1
 
< 0.1%

DAYS_EMPLOYED
Real number (ℝ)

High correlation 

Distinct3640
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59262.936
Minimum-15713
Maximum365243
Zeros0
Zeros (%)0.0%
Negative30322
Negative (%)83.2%
Memory size284.9 KiB
2025-05-10T11:34:52.551618image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum-15713
5-th percentile-7205
Q1-3153
median-1552
Q3-408
95-th percentile365243
Maximum365243
Range380956
Interquartile range (IQR)2745

Descriptive statistics

Standard deviation137651.33
Coefficient of variation (CV)2.3227222
Kurtosis1.1433987
Mean59262.936
Median Absolute Deviation (MAD)1309
Skewness1.7724432
Sum2.1605488 × 109
Variance1.894789 × 1010
MonotonicityNot monotonic
2025-05-10T11:34:52.681983image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
365243 6135
 
16.8%
-401 78
 
0.2%
-1539 64
 
0.2%
-200 63
 
0.2%
-1678 61
 
0.2%
-2087 61
 
0.2%
-2531 56
 
0.2%
-460 54
 
0.1%
-1160 53
 
0.1%
-2057 52
 
0.1%
Other values (3630) 29780
81.7%
ValueCountFrequency (%)
-15713 1
 
< 0.1%
-15661 4
 
< 0.1%
-15227 1
 
< 0.1%
-15072 3
 
< 0.1%
-15038 16
< 0.1%
-14887 6
 
< 0.1%
-14810 8
< 0.1%
-14775 2
 
< 0.1%
-14536 4
 
< 0.1%
-14473 6
 
< 0.1%
ValueCountFrequency (%)
365243 6135
16.8%
-17 3
 
< 0.1%
-43 1
 
< 0.1%
-65 2
 
< 0.1%
-66 1
 
< 0.1%
-70 4
 
< 0.1%
-71 1
 
< 0.1%
-73 17
 
< 0.1%
-78 1
 
< 0.1%
-79 1
 
< 0.1%

FLAG_MOBIL
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
1
36457 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 36457
100.0%

Length

2025-05-10T11:34:52.798467image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:53.244395image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
1 36457
100.0%

Most occurring characters

ValueCountFrequency (%)
1 36457
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36457
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 36457
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36457
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 36457
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 36457
100.0%

FLAG_WORK_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
0
28235 
1
8222 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

Length

2025-05-10T11:34:53.316988image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:53.384437image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

Most occurring characters

ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36457
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

Most occurring scripts

ValueCountFrequency (%)
Common 36457
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 28235
77.4%
1 8222
 
22.6%

FLAG_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
0
25709 
1
10748 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

Length

2025-05-10T11:34:53.462199image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:53.526426image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

Most occurring characters

ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36457
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

Most occurring scripts

ValueCountFrequency (%)
Common 36457
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 25709
70.5%
1 10748
29.5%

FLAG_EMAIL
Categorical

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
0
33186 
1
 
3271

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

Length

2025-05-10T11:34:53.610341image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:53.675322image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

Most occurring characters

ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36457
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36457
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 33186
91.0%
1 3271
 
9.0%

OCCUPATION_TYPE
Categorical

High correlation  Missing 

Distinct18
Distinct (%)0.1%
Missing11323
Missing (%)31.1%
Memory size2.3 MiB
Laborers
6211 
Core staff
3591 
Sales staff
3485 
Managers
3012 
Drivers
2138 
Other values (13)
6697 

Length

Max length21
Median length20
Mean length10.535768
Min length7

Characters and Unicode

Total characters264,806
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSecurity staff
2nd rowSales staff
3rd rowSales staff
4th rowSales staff
5th rowSales staff

Common Values

ValueCountFrequency (%)
Laborers 6211
17.0%
Core staff 3591
 
9.8%
Sales staff 3485
 
9.6%
Managers 3012
 
8.3%
Drivers 2138
 
5.9%
High skill tech staff 1383
 
3.8%
Accountants 1241
 
3.4%
Medicine staff 1207
 
3.3%
Cooking staff 655
 
1.8%
Security staff 592
 
1.6%
Other values (8) 1619
 
4.4%
(Missing) 11323
31.1%

Length

2025-05-10T11:34:53.762128image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
staff 12127
29.9%
laborers 6386
15.7%
core 3591
 
8.8%
sales 3485
 
8.6%
managers 3012
 
7.4%
drivers 2138
 
5.3%
high 1383
 
3.4%
skill 1383
 
3.4%
tech 1383
 
3.4%
accountants 1241
 
3.1%
Other values (13) 4496
 
11.1%

Most occurring characters

ValueCountFrequency (%)
a 30815
11.6%
s 30695
11.6%
r 25581
9.7%
e 25543
9.6%
f 24254
 
9.2%
t 17411
 
6.6%
15491
 
5.8%
o 12703
 
4.8%
i 10304
 
3.9%
n 8711
 
3.3%
Other values (26) 63298
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 223512
84.4%
Uppercase Letter 25454
 
9.6%
Space Separator 15491
 
5.8%
Dash Punctuation 175
 
0.1%
Other Punctuation 174
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 30815
13.8%
s 30695
13.7%
r 25581
11.4%
e 25543
11.4%
f 24254
10.9%
t 17411
7.8%
o 12703
5.7%
i 10304
 
4.6%
n 8711
 
3.9%
l 7231
 
3.2%
Other values (11) 30264
13.5%
Uppercase Letter
ValueCountFrequency (%)
L 6561
25.8%
C 4797
18.8%
S 4228
16.6%
M 4219
16.6%
D 2138
 
8.4%
H 1468
 
5.8%
A 1241
 
4.9%
P 344
 
1.4%
W 174
 
0.7%
R 164
 
0.6%
Other values (2) 120
 
0.5%
Space Separator
ValueCountFrequency (%)
15491
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 175
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 174
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 248966
94.0%
Common 15840
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 30815
12.4%
s 30695
12.3%
r 25581
10.3%
e 25543
10.3%
f 24254
9.7%
t 17411
 
7.0%
o 12703
 
5.1%
i 10304
 
4.1%
n 8711
 
3.5%
l 7231
 
2.9%
Other values (23) 55718
22.4%
Common
ValueCountFrequency (%)
15491
97.8%
- 175
 
1.1%
/ 174
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 264806
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 30815
11.6%
s 30695
11.6%
r 25581
9.7%
e 25543
9.6%
f 24254
 
9.2%
t 17411
 
6.6%
15491
 
5.8%
o 12703
 
4.8%
i 10304
 
3.9%
n 8711
 
3.3%
Other values (26) 63298
23.9%

CNT_FAM_MEMBERS
Real number (ℝ)

High correlation 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.198453
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2025-05-10T11:34:53.849720image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum20
Range19
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.91168614
Coefficient of variation (CV)0.4146944
Kurtosis8.1886954
Mean2.198453
Median Absolute Deviation (MAD)0
Skewness1.2985959
Sum80149
Variance0.83117162
MonotonicityNot monotonic
2025-05-10T11:34:53.934086image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2 19463
53.4%
1 6987
 
19.2%
3 6421
 
17.6%
4 3106
 
8.5%
5 397
 
1.1%
6 58
 
0.2%
7 19
 
0.1%
15 3
 
< 0.1%
9 2
 
< 0.1%
20 1
 
< 0.1%
ValueCountFrequency (%)
1 6987
 
19.2%
2 19463
53.4%
3 6421
 
17.6%
4 3106
 
8.5%
5 397
 
1.1%
6 58
 
0.2%
7 19
 
0.1%
9 2
 
< 0.1%
15 3
 
< 0.1%
20 1
 
< 0.1%
ValueCountFrequency (%)
20 1
 
< 0.1%
15 3
 
< 0.1%
9 2
 
< 0.1%
7 19
 
0.1%
6 58
 
0.2%
5 397
 
1.1%
4 3106
 
8.5%
3 6421
 
17.6%
2 19463
53.4%
1 6987
 
19.2%

account_age_months
Real number (ℝ)

Distinct61
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.164193
Minimum0
Maximum60
Zeros315
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2025-05-10T11:34:54.048946image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q112
median24
Q339
95-th percentile55
Maximum60
Range60
Interquartile range (IQR)27

Descriptive statistics

Standard deviation16.501854
Coefficient of variation (CV)0.63070373
Kurtosis-1.0377619
Mean26.164193
Median Absolute Deviation (MAD)14
Skewness0.28639457
Sum953868
Variance272.3112
MonotonicityNot monotonic
2025-05-10T11:34:54.177973image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 889
 
2.4%
11 828
 
2.3%
6 824
 
2.3%
8 820
 
2.2%
5 816
 
2.2%
17 807
 
2.2%
3 800
 
2.2%
10 798
 
2.2%
16 785
 
2.2%
15 774
 
2.1%
Other values (51) 28316
77.7%
ValueCountFrequency (%)
0 315
 
0.9%
1 551
1.5%
2 643
1.8%
3 800
2.2%
4 765
2.1%
5 816
2.2%
6 824
2.3%
7 889
2.4%
8 820
2.2%
9 770
2.1%
ValueCountFrequency (%)
60 321
0.9%
59 307
0.8%
58 333
0.9%
57 304
0.8%
56 345
0.9%
55 368
1.0%
54 358
1.0%
53 377
1.0%
52 463
1.3%
51 476
1.3%

is_high_risk
Categorical

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
0
36075 
1
 
382

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters36,457
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Length

2025-05-10T11:34:54.290941image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-05-10T11:34:54.354687image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Most occurring characters

ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36457
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36457
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 36075
99.0%
1 382
 
1.0%

Interactions

2025-05-10T11:34:49.228219image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:44.497866image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.249891image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.072902image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.889177image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.670300image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.454768image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.332122image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:44.595794image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.359804image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.184306image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.998066image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.778156image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.565459image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.433230image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:44.703978image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.503931image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.309415image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.105394image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.892170image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.673217image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.544498image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:44.815671image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.622768image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.421166image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.227187image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.005714image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.792623image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.654009image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:44.922097image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.738072image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.539519image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.333164image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.122241image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.905633image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.760459image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.033285image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.855791image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.660455image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.450123image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.228128image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.020501image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.871063image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.142998image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:45.962531image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:46.779830image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:47.559653image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:48.346102image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
2025-05-10T11:34:49.122297image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Correlations

2025-05-10T11:34:54.422167image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
AMT_INCOME_TOTALCNT_CHILDRENCNT_FAM_MEMBERSCODE_GENDERDAYS_BIRTHDAYS_EMPLOYEDFLAG_EMAILFLAG_OWN_CARFLAG_OWN_REALTYFLAG_PHONEFLAG_WORK_PHONEIDNAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPENAME_INCOME_TYPEOCCUPATION_TYPEaccount_age_monthsis_high_risk
AMT_INCOME_TOTAL1.0000.0440.0220.2010.095-0.1630.0860.2050.0410.0470.040-0.0210.1090.0330.0540.0980.1120.0280.000
CNT_CHILDREN0.0441.0000.8260.0630.380-0.1430.0070.0860.0070.0190.0550.0290.0170.0770.0320.0710.0550.0050.000
CNT_FAM_MEMBERS0.0220.8261.0000.1040.306-0.1470.0300.1180.0190.0230.0550.0270.0310.1540.0660.1200.0570.0260.000
CODE_GENDER0.2010.0630.1041.0000.2100.1750.0000.3610.0500.0260.0650.0500.0200.1610.0870.1910.5570.0180.012
DAYS_BIRTH0.0950.3800.3060.2101.000-0.2090.1100.1680.1340.0660.1970.0560.1260.1660.1130.3770.098-0.0530.003
DAYS_EMPLOYED-0.163-0.143-0.1470.175-0.2091.0000.0860.1570.0930.0040.243-0.0080.1490.2100.1140.9981.000-0.0800.002
FLAG_EMAIL0.0860.0070.0300.0000.1100.0861.0000.0210.0520.0090.0340.1640.1000.0310.0290.1100.0910.0130.016
FLAG_OWN_CAR0.2050.0860.1180.3610.1680.1570.0211.0000.0140.0130.0210.0610.1030.1550.0410.1620.2740.0410.000
FLAG_OWN_REALTY0.0410.0070.0190.0500.1340.0930.0520.0141.0000.0660.2080.1850.0410.0310.2070.0950.0520.0120.001
FLAG_PHONE0.0470.0190.0230.0260.0660.0040.0090.0130.0661.0000.3120.0630.0540.0440.0380.0100.0700.0160.000
FLAG_WORK_PHONE0.0400.0550.0550.0650.1970.2430.0340.0210.2080.3121.0000.1210.0480.0640.0360.2570.0620.0220.004
ID-0.0210.0290.0270.0500.056-0.0080.1640.0610.1850.0630.1211.0000.0420.0430.0330.0470.067-0.0010.016
NAME_EDUCATION_TYPE0.1090.0170.0310.0200.1260.1490.1000.1030.0410.0540.0480.0421.0000.0450.0520.1000.2060.0140.006
NAME_FAMILY_STATUS0.0330.0770.1540.1610.1660.2100.0310.1550.0310.0440.0640.0430.0451.0000.0560.1080.1050.0300.001
NAME_HOUSING_TYPE0.0540.0320.0660.0870.1130.1140.0290.0410.2070.0380.0360.0330.0520.0561.0000.0620.0710.0140.000
NAME_INCOME_TYPE0.0980.0710.1200.1910.3770.9980.1100.1620.0950.0100.2570.0470.1000.1080.0621.0000.1800.0130.010
OCCUPATION_TYPE0.1120.0550.0570.5570.0981.0000.0910.2740.0520.0700.0620.0670.2060.1050.0710.1801.0000.0240.042
account_age_months0.0280.0050.0260.018-0.053-0.0800.0130.0410.0120.0160.022-0.0010.0140.0300.0140.0130.0241.0000.063
is_high_risk0.0000.0000.0000.0120.0030.0020.0160.0000.0010.0000.0040.0160.0060.0010.0000.0100.0420.0631.000

Missing values

2025-05-10T11:34:50.049296image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
A simple visualization of nullity by column.
2025-05-10T11:34:50.308187image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_BIRTHDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSaccount_age_monthsis_high_risk
05008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2.0150
15008805MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2.0140
25008806MYY0112500.0WorkingSecondary / secondary specialMarriedHouse / apartment-21474-11341000Security staff2.0290
35008808FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.040
45008809FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.0260
55008810FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.0260
65008811FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.0380
75008812FNY0283500.0PensionerHigher educationSeparatedHouse / apartment-224643652431000NaN1.0200
85008813FNY0283500.0PensionerHigher educationSeparatedHouse / apartment-224643652431000NaN1.0160
95008814FNY0283500.0PensionerHigher educationSeparatedHouse / apartment-224643652431000NaN1.0170
IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_BIRTHDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSaccount_age_monthsis_high_risk
364475149145MYY0247500.0WorkingSecondary / secondary specialMarriedHouse / apartment-10952-35771100Laborers2.0250
364485149158MYY0247500.0WorkingSecondary / secondary specialMarriedHouse / apartment-10952-35771100Laborers2.0280
364495149190MYN1450000.0WorkingHigher educationMarriedHouse / apartment-9847-5021011Core staff3.0111
364505149729MYY090000.0WorkingSecondary / secondary specialMarriedHouse / apartment-19101-17211000NaN2.0210
364515149775FYY0130500.0WorkingSecondary / secondary specialMarriedHouse / apartment-16137-93911010Laborers2.0190
364525149828MYY0315000.0WorkingSecondary / secondary specialMarriedHouse / apartment-17348-24201000Managers2.0111
364535149834FNY0157500.0Commercial associateHigher educationMarriedHouse / apartment-12387-13251011Medicine staff2.0230
364545149838FNY0157500.0PensionerHigher educationMarriedHouse / apartment-12387-13251011Medicine staff2.0320
364555150049FNY0283500.0WorkingSecondary / secondary specialMarriedHouse / apartment-17958-6551000Sales staff2.091
364565150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1.0130